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Chapter 1 
The Nature of Probability and 
Statistics 


Chapter 1 Overview 


1-1 Descriptive and Inferential Statistics 

1-2 Variables and Types of Data 

1-3 Data Collection and Sampling Techniques 
1-4 Experimental Design 


1-5 Computer and Calculators 


Objectives 
After competing this chapter, you should be able to: 


Demonstrate knowledge of statistical terms 


Differentiate between the two branches of statistics 

Identify types of data 

Identify the measurement level for each variable 

Identity the four basic sampling techniques 

LJExplain the difference between an observational and an experimental study 
Q Explain how statistics can be used and misused 


LJExplain the importance of computers and calculators in statistics 


Statistics is the science of conducting studies to 


% collect, 
“* organize, 
summarize, 
% analyze, and 
«draw conclusions from data. 


1-1 Descriptive Statistics and Inferential Statistics 


Statistics is divided into two main areas, depending on how data are used. 
The two areas are 
«e Descriptive statistics 


“Inferential statistics 


Descriptive Statistics 


Descriptive Statistics consists of the 
$ Collection, 
«Organization, 
< Summarization, and 


«Presentation of data. 


Inferential Statistics 


Inferential statistics consists of generalizing from samples to populations, 
performing estimations and hypothesis tests, determining relationships among 


variables, and making predictions. 


A variable is a characteristic or attribute that can assume different values. 


% A population consists of all subjects (human or otherwise) that are being 
studied. 
+ A sample is a group of subjects selected from a population. 


1-2 Variables and Types of Data 


The classification of variables can be summarized as follows: 


Quantitative 
Numerical 


Qualitative 
categorical 


Continuous 
Can be measured 
3.5, 302.6, etc 


Discrete 
Countable 
5, 29, etc 


1-2 Variables and Types of Data 


<% Qualitative variables are variables that have distinct categories according to some 
characteristic or attribute. 


«For example: blood type, voting opinion, the brand of car, Gender 


«e Quantitative variables are variables that can be counted or measured. 
For example: number of pizza sold one day, test score, temperature and income 
«s Discrete variables assume values that can be counted. 


«Continuous variables can assume an infinite number of values between any two specific 
values. They are obtained by measuring. They often include fractions and decimals. 


Levels of Measurement 


> Nominal- categorical (names) 
>» Ordinal - nominal, plus can be ranked (order) 
> Interval - ordinal, plus precise differences between unit of measure 


> Ratio -interval, plus ratios are consistent, true zero 


Examples for Levels of Measurement 


Hair color Nominal 
Pizza size Yes Yes Ordinal 
Height Yes Yes Yes Yes Ratio 
Age Yes Yes Yes Yes Ratio 
Temperature Yes Yes Yes No Interval 


Data Sources 


Data Compilatio : L 
Print or Electronic 


a.. < aaa ee a 
Prim arly 


Data Collection 


Population and Sample 


A Population is the set of all items or individuals of interest 
° Examples: All likely voters in the next election 
All parts produced today 


All sales receipts for November 


A Sample is a subset of the population 


° Examples:1000 voters selected at random for interview 
A few parts selected for destructive testing 
Every 100" receipt selected for audit 


Population and Sample 


Population 


Sample 
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Why Sample? 


% Less time consuming than a census 


“*Less costly to administer than a census 


“*It is possible to obtain statistical results of a sufficiently high precision based 
on samples. 


Sampling Techniques 


Probability Samples 


es 
Random 
Convenience | [Stratified | 


Statistical Sampling 


Items of the sample are chosen based on known or calculable probabilities 


Probability Samples 


Simple Stratified 
Random Sampling 


Sampling 


Systematic Cluster 
Sampling Sampling 


Simple Random Sample 


“Every individual or item from the population has an equal chance of being 
selected 


“*Selection may be with replacement or without replacement 


**Samples can be obtained from a table of random numbers or computer random 
number generators 


I 
A 


Stratified Sample 


«Population divided into subgroups (called strata) according to some common 
characteristics 


< Simple random sample selected from each subgroup 


Samples from subgroups are combined into one mw | 


Systematic Sample 


Decide on sample size: n 
Divide frame of N individuals into groups of k individuals: k=N/n 


Randomly select one individual from the 1° group 


Select every kt" individual i | 
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Cluster Sample 


«Population is divided into several clusters, each representative of the population 
**A simple random sample of clusters is selected 


* All items in the selected clusters can be used, or items can be chosen from a cluster using 
another probability sampling technique 


population divide à | | VT FY | | | 


into 16 clusters. 


Randomly selected 
clusters for sample 
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Key Definitions 


**A population is the entire collection of things under consideration 


%A parameter is a summary measure computed to describe a characteristic of the 
population 


«A sample is a portion of the population selected for analysis 
“* A statistic is a summary measure computed to describe a characteristic of the sample 


Inferential Statistics 


Making statements about a population by examining sample results 


Sample statistics Population parameters 


(known) Inference s-_-(unknown, but can 
be estimated from 


sample evidence) 


AA Population 


Inferential Statistics 


Drawing conclusions and/or making decisions concerning a population based on sample results. 


Estimation 


o e.g: Estimate the population mean weight usingthe =| PY eae a 
———— PUTT TTT RETREAT 
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Hypothesis Testing PIRITTA AT 


° e.g: Use sample evidence to test the claim that the 


population mean weight is 120 pounds ere eT a PITERA oF 


Data Types 


_— 


Examples: | | 


= Marital Status 
# Political Party 
m Eye Color 
(Defined categories) 


Examples: Examples: 
m Number of Children = Weight 
m Defects per hour m Voltage 
(Counted items) (Measured characteristics) 


Data Types 


Time Series Data 
o Ordered data values observed over time 


Cross Section Data 
o Data values observed at a fixed point in time 


Data Types 


| |X. Sales (in $1000’s) 


eo [2004 | 2005 | 2006 
[Atana || 435 || 460 | 475 | 490- 


Boson || a20 | 345 | a75 | 395 


Cieveiana || 405 || 300 | #10 | 395 


Denver [260] | 270 | 285 | 280 
WT, 


Cross Section 
Data 


Data Measurement Levels 


7 ; ; Highest Level 
casurements Ratio/Interval Data | 
t Complete Analysis 
Rankings k Higher Level 
Ordered Categories Ordinal Data Mid-level Analysis 


Categorical Codes ID Lowest Level 
Numbers Nominal Data 
Basic Analysis 


Category Names 
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Introduction 
m 2-1 Organizing Data 
m 2-2 Histograms, Frequency Polygons and Ogives 
m 2-3 Other Types of Graphs 


m 2-4 Paired Data and Scatter Plots 
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- 
Objectives 


Organize data using frequency distributions 


Represent data in frequency distributions 
graphically using histograms, frequency polygons, 
and ogives. 

Represent data using Pareto charts, time series 
graphs, and pie graphs. 

Draw and interpret a stem and leaf plot. 


Draw and interpret a scatter plot for a set of paired 
data. 
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SSS ht 
2-1 Organizing Data 
m Data collected in original form is called raw data. 


= A frequency distribution is the organization of 
raw data in table form, using classes and 
frequencies. 


m Nominal- or ordinal-level data that can be placed 
in categories is organized in categorical 
frequency distributions. 
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- 
Categorical Frequency Distribution 


= Twenty-five army indicates were given a blood test to 
determine their blood type. 


= Raw Data: A,B,B,AB,O,O,0O,B,AB,B,B,B,0,A,O,A,O,0,O, 
AB, AB,A,O,B,A 


WI 
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Grouped Frequency Distribution 


= Grouped frequency distributions are used 
when the range of the data is large. 


m The smallest and largest possible data values in a 
class are the lower and upper class limits. 
Class boundaries separate the classes. 


= To find a class boundary, average the upper class 
limit of one class and the lower class limit of the 
next class. 
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- 
Grouped Frequency Distribution 


m The class width can be calculated by 
subtracting 
O successive lower class limits (or boundaries) 
O successive upper class limits (or boundaries) 
Oupper and lower class boundaries 


m The class midpoint X,, can be calculated by 


averaging 
Oupper and lower class limits (or boundaries) 
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Rules for Classes in Grouped Frequency 
Distributions 


m There should be 5-20 classes. 

m The class width should be an odd number. 
m The classes must be mutually exclusive. 

m The classes must be continuous. 

m The classes must be exhaustive. 


m The classes must be equal in width (except in 
open-ended distributions). 


m Class width=Range/no. of classes 
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Constructing a Grouped Frequency Distribution 


The following data represent the record high 
temperatures for each of the 50 states. Construct a 
grouped frequency distribution for the data using 7 
classes. 


112 100 127 120 134 118 105 110 109 112 
110 118 117 116 118 122 114 114 105 109 
107 112 114 115 118 117 118 122 106 110 
116 108 110 121 113 120 119 111 104 11l 
120 113 120 117 105 110 118 112 114 114 
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Range<10 = Ungrouped frequency distribution 


Range>10 = = Grouped frequency distribution 


Class limit _| Tally 


100-104 
105-109 / 1 
110-114 ii 5 
115-119 / 1 
120-124 
125-129 


130-134 
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te 
Constructing a Grouped Frequency Distribution 


STEP 1 Determine the classes. 


Find the class width by dividing the range by the 
number of classes 7. 


Range = High — Low 
= 134 — 100 = 34 


Class Width = Rangel] = 34/7 =5 


Rounding Rule: Always round up if a remainder. 
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Constructing a Grouped Frequency Distribution 


mFor convenience sake, we will choose the lowest data 
value, 100, for the first lower class limit. 


m The subsequent lower class limits are found by adding the 
width to the previous lower class limits. 


Class Limits 


100- 104 m lhe first upper class limit is one less than 


105 - 109 the next lower class limit. 

110- 114 whe ai aes 

115- 119 mThe subsequent upper class limits are 
120- 124 found by adding the width to the previous 
125 - 129 upper class limits. 


130 - 134 
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Constructing a Grouped Frequency Distribution 


= The class boundary is midway between an upper class 
limit and a subsequent lower class limit. 104,104.5,105 


Class Limits PS Frequency Cumulative 
Boundaries Frequency 


100 - 104 99.5 - 104.5 
105 - 109 104.5 - 109.5 


110 - 114 109.5 - 114.5 | 105-104=1/2=0.5 
115 - 119 114.5 - 119.5 |Lower-0.5 

120 - 124 119.5 - 124.5 | Upper-0.5 

125 - 129 124.5 - 129.5 

130 - 134 129.5 - 134.5 


Bluman, Chapter 2 13 


Constructing a Grouped Frequency Distribution 


STEP 2 Tally the data. 
STEP 3 Find the frequencies. 


Class Class Beauene Class 
Limits Boundaries quency | Midpoint 


100 - 104 99.5 - 104.5 
105-109 | 104.5 - 109.5 


110-114 | 109.5 - 114.5 
115-119 | 114.5 -119.5 
120-124 | 119.5 - 124.5 
125-129 | 124.5 - 129.5 
130-134 | 129.5 - 134.5 
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Constructing a Grouped Frequency Distribution 


STEP 4 Find the cumulative frequencies by 
keeping a running total of the frequencies. 


Class Limits ea Frequenc Cumulative 
Boundaries d 7 — 


100 - 104 99.5 - 104.5 
105-109 | 104.5 - 109.5 


110-114 | 109.5 - 114.5 
115-119 | 114.5 - 119.5 
120-124 | 119.5 - 124.5 
125-129 | 124.5 - 129.5 
130-134 | 129.5 - 134.5 


THANK YOU 
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Chapter 2 


Overview 


m 2-2 Histograms, Frequency Polygons and Ogives 
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Chapter 2 
Objectives 


= Represent data in frequency distributions graphically 
using histograms, frequency polygons, and ogives. 
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Histograms, Frequency Polygons, and Ogives 


3 Most Common Graphs in Research 


1. Histogram 
2. Frequency Polygon 


3. Cumulative Frequency Polygon (Ogive) 
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Histograms, Frequency Polygons, and Ogives 
The histogram is a graph that displays the data by 
using vertical bars of various heights to represent 


the frequencies of the classes. 


The class boundaries are represented on the 
horizontal axis. 
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Histograms 


Histograms use class boundaries and 
frequencies of the classes. 


Class Class aves genie 
Limits Boundaries q y 


100 - 104 99.5 - 104.5 
105-109 | 104.5 - 109.5 


110-114 | 109.5 - 114.5 
115-119 | 114.5 - 119.5 
120-124 | 119.5 - 124.5 
125-129 | 124.5 - 129.5 
130 - 134 | 129.5 - 134.5 
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m 
Histograms 


Histograms use class boundaries and 
frequencies of the classes. 


y Record High Temperatures 


09.5 104.5° 109.5° 114.57 119.5° 124.5" 129.5° 134.5° 
Temperature (°F) 
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2.2 Histograms, Frequency Polygons, 
and Ogives 


m The frequency polygon is a graph that displays 
the data by using lines that connect points plotted 
for the frequencies at the class midpoints. The 


frequencies are represented by the heights of the 
points. 


m The class midpoints are represented on the 
horizontal axis. 
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Frequency Polygons 


Frequency polygons use class midpoints and 
frequencies of the classes. 


Class Class ee 
Limits Midpoints sli 


100 - 104 
105 - 109 


110 - 114 
115 - 119 
120 - 124 
125 - 129 
130 - 134 
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Frequency Polygons 


Frequency polygons use class midpoints and 
frequencies of the classes. 


Record High Temperatures A frequency polygon 
is anchored on the 
x-axis before the first 
class and after the 
last class. 


y 


Frequency 


102 107°" ie {i77 {12% {q2 13% 
Temperature (°F) 


Bluman, Chapter 2 


- 
2.2 Histograms, Frequency Polygons, 
and Ogives 
m The ogive is a graph that represents the 


cumulative frequencies for the classes in a 
frequency distribution. 


= The upper class boundaries are represented 
on the horizontal axis. 
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Ogives 


Ogives use upper class boundaries and 
cumulative frequencies of the classes. 


Class Class Beaune Cumulative 
Limits Boundaries q y aeons 
100 - 104 99.5 - 104.5 
105-109 | 104.5 - 109.5 


110-114 | 109.5 - 114.5 
115-119 | 114.5 - 119.5 
120 - 124 | 119.5 - 124.5 
125-129 | 124.5 - 129.5 
130 - 134 | 129.5 - 134.5 
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Ogives 


Ogives use upper class boundaries and 
cumulative frequencies of the classes. 


Class Boundaries 
Frequency 


Less than 104.5 
Less than 109.5 


Less than 114.5 
Less than 119.5 
Less than 124.5 
Less than 129.5 
Less than 134.5 
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Ogives 


Ogives use upper class boundaries and 
cumulative frequencies of the classes. 


y Record High Temperatures 


Cumulative 
frequency 


995° 104.5° 109.5° 114.5° 119.5° 124.5° 129.5° 134.5° 
Temperature (°F) 
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Procedure Table 
Constructing Statistical Graphs 
1: Draw and label the x and y axes. 


2: Choose a suitable scale for the frequencies or 
cumulative frequencies, and label it on the y axis. 


3: Represent the class boundaries for the histogram or 
ogive, or the midpoint for the frequency polygon, on 
the x axis. 


4: Plot the points and then draw the bars or lines. 
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Histograms, Frequency Polygons, and Ogives 


If proportions are used instead of frequencies, the 
graphs are called relative frequency graphs. 


Relative frequency graphs are used when the 
proportion of data values that fall into a given class 
is more important than the actual number of data 
values that fall into that class. 
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Sete 
Construct a histogram, frequency polygon, and ogive using 
relative frequencies for the distribution (shown here) of the 
miles that 20 randomly selected runners ran during a given 


week. 
Class Frequenc 
Boundaries q y 


9.9 - 10.5 
10.5 - 15.5 


15.5 - 20.5 
20.5 - 25.5 
29.9 - 30.5 
30.5 - 35.5 
35.5 - 40.5 
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Histograms 


The following is a frequency distribution of miles run 
per week by 20 selected runners. 


Class Pranuenc Relative 
Boundaries q y Frequency 


Divide each 


frequency by 
the total 
5.5 - 10.5 1/20 = 0.05 frequency to 
10.5 13.3 2/20 = 0.10 get the 
15.5 - 20.5 3/20 = 0.15 relative 
20.5 - 25.5 5/20 = 0.25 frequency. 


25.5 - 30.5 
30.5 - 35.5 
35.5 - 40.5 


4/20 = 0.20 
3/20 = 0.15 


2/20 = 0.10 


rf = 1.00 
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Histograms 


Use the class boundaries and the 
relative frequencies of the classes. 


y Histogram for Runners’ Times 


Relative frequency 


5.5 10.5 15.5 20.5 25.5 30.5 
Miles 


Bluman, Chapter 2 


35.5 


40.5 
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Frequency Polygons 


The following is a frequency distribution of 
miles run per week by 20 selected runners. 


Class Class Relative 
Boundaries | Midpoints | Frequency 


5.5 - 10.5 
10.5 - 15.5 


15.5 - 20.5 
20.5 - 25.5 
29.5 - 30.5 
30.5 - 35.5 
35.5 - 40.5 


Bluman, Chapter 2 20 


Frequency Polygons 


Use the class midpoints and the relative 
frequencies of the classes. 

y Frequency Polygon for Runners’ Times 

0.25 


Relative frequency 
mn 


Miles 


Bluman, Chapter 2 


21 


Sete 
Ogives 


The following is a frequency distribution of 
miles run per week by 20 selected runners. 


Class eaten Cumulative Cum. Rel. 
Boundaries q y Frequency Frequency 
5.5 - 10.5 1/20 = 0.05 


10:5=-15.5 3/20 = 0.15 
io 20.9 6/20 = 0.30 


vod Oe ea oe 11/20= 0.55 
25.9 - 30.5 15/20 = 0.75 
30.5 - 35.5 18/20 = 0.90 
35.5 - 40.5 20/20 = 1.00 
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Ogives 


Ogives use upper class boundaries and 
cumulative frequencies of the classes. 


Class Boundaries 
Frequency 


Less than 10.5 
Less than 15.5 


Less than 20.5 
Less than 25.5 
Less than 30.5 
Less than 35.5 
Less than 40.5 
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Ogives 
Use the upper class boundaries and the 


cumulative relative frequencies. 


y Ogive for Runners’ Times 


Cumulative relative frequency 
co 
a 
= 


5.5 10.5 15.5 20.5 25.5 30.5 35.5 
Miles 
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Shapes of Distributions 


~ a a 


Reverse J-shaped 
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Shapes of Distributions 


B Right-s kewed 


Letts kewed 


Bimo dal U-shaped 
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Chapter 2 Overview 


m 2-3 Other Types of Graphs 
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Chapter 2 Objectives 


" Represent data using Pareto charts, time series graphs, 
and pie graphs. 


Bluman, Chapter 2 3 


Bar Graphs 


e When the data are qualitative or categorical, bar graphs can be 
used to represent the data. 


e A bar graph can be drawn using either horizontal or vertical bars. 


e A bar graph represents the data by using vertical or horizontal bars 
whose heights or lengths represent the frequencies of the data. 


e Draw and label the x and y axes. For the horizontal bar graph place 


the frequency scale on the x axis, and for the vertical bar graph 
place the frequency scale on the y axis. 
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Horizontal Bar Graph Vertical Bar Graph 


FIGURE 2-9 Bar Graphs for Example 2-8 


First-Year College Student Spending Average Amount Spent 


$800 


$700 
Electronics 
$600 
$500 
Dorm decor 
$400 
Clothing $300 
$200 
Shoes $100 
x 
$0 
$0 $100 $200 $300 $400 $500 $600 $700 $800 Shoes Clothing Dorm Electronics 


decor 
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Compound Bar Graphs 


Bar graphs can also be used to compare data for two or more groups. These 
types of bar graphs are called compound bar graphs. 


y Never Married Adults 


Number (millions) 


1960 1980 2000 2010 
Year 


This graph shows that there have consistently been more never married males than 
never married females and that the difference in the two groups has increased 
slightly over the last 50 years. Bluman, Chapter 2 


Pareto Chart 


= When the variable displayed on the horizontal axis is qualitative or 
categorical, a Pareto chart can also be used to represent the data. 


= A Pareto chart is used to represent a frequency distribution for a 
categorical variable, and the frequencies are displayed by the 
heights of vertical bars, which are arranged in order from highest 
to lowest. 
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Pareto Charts 


How People Get to Work 


Frequency 


Auto Bus Trolley Train Walk 
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Time Series Graphs 
=" When data are collected over a period of time, they can be 
represented by a time series graph. 
= A time series graph represents data that occur over a specific period 


of time 


FIGURE 2-12 Figure for Example 2-10 


Price for an Advertisement 


NI 
O 


1.8 
1. 


Cost (in millions) 


Ta 
+f 


2010 2011 2012 2013 2014 2015 
Year 


Graph 


Two or more data sets can be compared on the same graph 
called a compound time series araoh if two or more lines are 


Elderly in the U.S. Labor Force 


Percent 


1960 1970 1980 1990 


Year 


2000 £2010 


lO 
Source: Bureau of Census, U.S. Department of Commerce. 
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Island 
Ocean 
Beach 
Mountain 
Horizon 
Forest 
Sky 


River 


Group Assignment | 
Exercises 2.1 


No. 1 and 13 
No. 2 and 14 
No. 3 

No. 4 

No. 5 and 15 
No. 6 and 16 
No. 17 


No. 18 


pgdrs.yue@gmail.com 
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Chapter 2 Objectives 


= Represent data using pie graphs, dotplots, and stem 
and leaf plots. 
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Pie Graphs 


e Pie graphs are used extensively in statistics. 


e The purpose of the pie graph is to show the relationship of 
the parts to the whole by visually comparing the sizes of the 
sections. 


e Percentages or proportions can be used. 
e The variable is nominal or categorical. 


e A pie graph is a circle that is divided into sections or wedges 
according to the percentage of frequencies in each category 
of the distribution. 
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Pie Graphs 


Marital Status of Employees 
at Brown's Department Store 
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Dotplots 


e Adotplot uses points or dots to represent the data values. 


e Ifthe data values occur more than once, the corresponding 
points are plotted above one another. 


e Adotplot is a statistical graph in which each data value is 
plotted as a point (dot) above the horizontal axis. 


e Doitplots are used to show how the data values are distributed 
and to see if there are any extremely high or low data values. 
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FIGURE 2-16 Figure for Example 2-13 


e © ee ® 
eeee e080 o 
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5 10 15 20 25 30 


The graph shows that the majority of the named storms occur with frequency between 6 


and 16 per year. There are only 3 years when there were 19 or more named storms per year. 
ee 
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Stem and Leaf Plots 


e The stem and leaf plot is a method of organizing data and is 
a combination of sorting and graphing. 


e |thas the advantage over a grouped frequency distribution of 
retaining the actual data while showing them in graphical 
form. 


e A stem and leaf plot is a data plot that uses part of the data 


value as the stem and part of the data value as the leaf to 
form groups or classes. 
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At an outpatient testing center, the number of cardiograms 


performed each day for 20 days is shown. Construct a stem 
and leaf plot for the data. 


25 31 20 32 13 
14 43 2 57 23 
36 32 33 32 4 
32 52 44 SI 45 
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23 31 20 32 13 
14 43 2 57 23 
36 ey 33 32 44 
32 52 44 51 45 


Unordered Stem Plot Ordered Stem Plot 
0/2 0/2 

113 4 113 4 

215 0 3 210 3 5 

3/1 262322 311222236 
43445 43445 


Dia 21 51 2 7 
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Excel Command 


=REPT("0",COUNTIF($A$2:$A$11,F8*10+0))&REPT("1",COUNTIF($A$2:$A$11,F8* 
10+1))&REPT("2",COUNTIF($A$2:$A$11,F8*10+2))&REPT("3",COUNTIF($A$2:$A$ 
11,F8*1043))&REPT("4",COUNTIF($A$2:$4$11,F8*10+4))&REPT("5",COUNTIF($A$ 
2:$A$11,F8*1045))&REPT("6", COUNTIF($A$2:$A$11,F8*10+6))&REPT("7",COUNTI 
F($A$2:$A$11 ,F8*10+7))&REPT("8", COUNTIF($A$2:$A$11,F8*10+8))&REPT("9",C 
OUNTIF($A$2:$4$11,F8*10+9)) 


Excel Command 


=REPT(("9",COUNTIF($A$2:$A$11,F8*10+9))&REPT("8", COUNTIF($A$2:$A$11,F8*10+€ 
REPT("7",COUNTIF($A$2:$A$11,F8*10+7))&REPT("6",COUNTIF($A$2:$A$11,F8*10+6)) 
EPT("5",COUNTIF($A$2:$A$11,F8*10+5))&REPT("4", COUNTIF($A$2:$A$11,F8*10+4))& 
PT("3", COUNTIF($A$2:$A$11 ,F8*10+3))&REPT("2", COUNTIF($A$2:$4$11,F8*1042))&R 
T("1",COUNTIF($A$2:$A$11 ,F8*10+1))&REPT( "0",COUNTIF($A$2:$A$11 ,F8*10+0)) 
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Chapter 6 


Methods of Data Collection 


After reading this chapter, student should be able to: 


> Understand the difference between qualitative and quantitative methods of data 
collection 


> Describe various types of data collection methods, and state their uses and limitations 


> Use an appropriate method or a combination of different methods for data collection 


> Identify ethical issues involved in business research and the ways of ensuring that 
research informants or subjects are not harmed by the study 


Chapter 6 


Overview 


6.1 Data Collection Method: Qualitative versus Quantitative 


Quantitative Techniques 
o° 6.1.1 Observation 
° 6.1.2 Survey Method 
© Personal Interview (Face to Face) 
o Mail Survey 
° Telephone Survey 
° Internet Survey 


6 Introduction 


>The data collection methods are the means to collect information about the objects under study in a 
systematic way. 


>If data is collected randomly, it would be difficult to answer research questions in a conclusive way. 
> Research can be divided into primary and secondary, based on the sources of data collection. 


> Secondary research involves any information from published sources which has not been specifically collected 
for the current research problems. This includes the published sources of data such as an electronic database, 
periodicals, company’s annual reports, etc. 


> Primary data collection becomes necessary when a researcher is unable to find the data needed in secondary 
sources. 


> Primary research involves collecting information specifically for the study in hand from the actual sources such 
as consumers, users/ non-users or other entities involved in the research. 


>The primary data is therefore collected fresh and for the first time, and thus happens to be original in 
character. 


6.1 Data Collection Method: Qualitative versus Quantitative 


> Qualitative research explores attitudes, behavior and experiences through methods such as 
interviews or focus groups. 
>It attempts to get an in-depth opinion from participants. 


>As qualitative research examine attitudes, behavior and experiences which are related to the 
personal information of participants, fewer people take part in the research but the contact 


with these people tends to last a lot longer. 
>There are many different methods such as participants observations, in-depth interview and 
focus group discussion. 


6.1 Data Collection Method: Qualitative versus Quantitative 


> Quantitative research generates statistics through the use of large-scale survey research, using 
methods such as questionnaires or structured interviews. 


>If a business analyst or market researcher has stopped you on the streets, or you have filled in 
a questionnaire which has arrived through the post. 


>This type of research reaches many more people, but the contact with those people is much 
shorter than it is in qualitative research. 


6.1.1 Observation 


>The observation method is the most commonly used method especially in studies relating to 
behavioural sciences. 


> Observation is a technique that involves systematically selecting, watching and recording 
behavior and characteristics of living beings, objects or phenomena. 


> Under the observation method, the information is sought by way of investigator's own direct 
observations without the knowledge of the respondents. 


>The main advantage of this method is that the subjective bias is eliminated as actual 
behaviours get recorded and thus, we get more accurate information. 


>» Secondly, the information obtained under this method relates to what is currently happening; 
it is not complicated by either the past behavior or future intentions or attitudes. 


>Thirdly, this method is independent of respondents’ willingness to respond and as such is 
relatively less demanding of active cooperation on the part of respondents. 


While choosing observation as the method of data collection, the researcher should keep 
certain things in mind: 
> What should be observed? How the observations should be recorded? Or how the accuracy 
of observations should be ensured? 


>If the observation is characterized information of units to be observed, the style of 

recording the observed information, standardized conditions of observation and the 
selection of pertinent data of observation, then the observation is known as structured 
observation. 


>On the other hand, if the observation is to take place without these characteristics to be 
sought in advance, it is known as unstructured observation. 


>» Structured observation is considered as appropriate in descriptive studies, whereas in an 
exploratory study, the observation is more likely to be unstructured. 


However, the observational method has certain limitations. 


>One of the major problem might be that we are not sure if a representative sample is 
chosen for the study, because recording of observational data take place at public places and 
we do not have control over who and how many are being observed at a given time. 


6.1.2 Survey Method 


>A survey involves the collection of information from representative target respondents using 
a predesigned questionnaire. 


> Unlike observation, it is structured method of data collection in which we extract exactly the 
same information from all the target population. 


>There are basically four types of survey used by researchers which are described below. 


Personal interview (face to face) 


> Among all the survey methods, personal interview (face-to-face) is the most widely used by 
researchers all over the world. 


>These types of interviews consist of administering structured questionnaires and trained 
interviewers ask fixed, choice questions in a consistent format. 


> In order to obtain a more accurate outcome of the overall survey, the interviewer should 
keep the following points in his/her mind. These important tips will ensure reliable, credible 


Mail survey 


>Imagine that you are interested in exploring the attitudes college students have about 
writing. 


>» Since it would be impossible to interview every student on campus, choosing the mail-out 
survey as your method would enable you to choose a large sample of college students. 


>You might choose to limit your research to your own college or university, or you might 
extend your survey to several different institutions. 


> If your research question demands it, the mail survey allows you to sample a very broad 
group of subjects at a small cost. 


Telephonic survey 


>This is an alternative form of interview to the personal, face-to-face interview, where the 
interviewer collects the relevant information from the target respondents through telephonic 
conversation. 


> Following points should be helpful to locate the respondents and make them agree to 
participate for the telephonic survey. 


Internet survey 


> With the growth of the Internet and the expanded use of electronic mail for all 
purposes, the electronic survey is becoming one of the most widely used survey 
method these days. 


The electronic surveys can be done in many ways: 


>(i) The survey forms can be distributed as electronic mail messages through 
attachment to potential respondents, 


> (ii) the survey form can be posted as World Wide Web forms on the Internet, 
and 


> (iii) the survey form can be distributed via publicly available computers in high- 
traffic areas such as libraries and shopping malls. 
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6.2 Collection of Secondary Data 
6.3 Selection of Appropriate Methods of Data Collection 


Ethical Issues 


6.1.3 Qualitative Techniques 


The research objective calls for more indirect methods of questioning, either 
because normal quantitative surveys are inadequate or inappropriate. 


In such cases, qualitative methods, which probe the mind of respondent, might 
be useful. 


The major requirement for qualitative techniques is that we need a behavioural 
specialist such as a psychologist or sociologist to analyze the findings. 


The in qualitative techniques is usually and analysis and 
interpretation 1s not as easy as it is in the quantitative studies. 


If done by non-expert, qualitative research can be completely misleading. Some 
of the important tools are discussed in detail below. 


In-depth interview 


In-depth interviewing is a qualitative research technique that involves conducting intensive 
individual with a small number of respondents to explore their perspectives on a particular 
idea, programmer or situation. 


In- depth interviews are useful when detailed information about a person's thoughts and 
behaviours is needed or new issues are to be explored in depth. 


Interviews are often used to provide context to other data (such as outcome data), offering a 
complete picture of what happened in the programme and why. 


In-depth interviews should be used in place of focus groups if the potential participants are 
not to be included or are not comfortable talking openly in a group, or when one wants to 
distinguish an individual's (as opposed to group) opinions about the programme. 


They are often used to refine questions for future surveys of a particular group. 


The primary advantage of in-depth interviews is that they 
than what is available through other data collection methods such as surveys. 


Focus group discussion 


A focus group is a group of interacting individuals having some common interests or 
characteristics, brought together by a researcher, who uses the group and its interaction as a 
way to gain information about a specific or focused issue. 


The focus groupis a carefully planned and moderated discussion to obtain the meaningful 
information on the area of interest in a non-threatening environment. 


The focus group discussion is an unstructured method of data collection where the 
respondents express their views freely. 


It is mostly used for explorative studies rather than any conclusive studies. 


Groups are comprised for respondents who share similar concerns and responsibilities but 
have minimal contact with each other in their daily lives. 


As groups differ in their composition and dynamics, multiple groups should be organized to 
obtain information from a different perspective on a given topic. 


Groups typically contain approximately 6-12 people: large enough to provide for a range of 
views but small enough for everyone to contribute. 


Projective techniques 


Projective techniques are used by psychologists of respondents in inferring underlying 
motives, urges Orintentions which are such that the respondents resist revealing them or are 
unable to figure it out themselves. 


The respondent in supplying information tends unconsciously to project his/her own 
attitudes or feelings on the subject under study. 


Projective techniques play an important role in motivational studies or in attitude surveys. In 
the following paragraphs, we shall discuss some of the important projective techniques. 


6.2 Collection of Secondary Data 
Secondary data is indispensable for most organizational research. 


Secondary data refers to information that have been already gathered by someone (individual or 
agencies) and =. >...) to the researcher. 


The secondary data can be internal or external to the organization and it can be accessed through 
the Internet or perusal of recorded or published information. 


Generally, business research should be undertaken after a prior search of secondary sources. 


The secondary sources of information are important for any business research due to the 
following reasons. 


Secondary data may be available which is entirely appropriate and wholly adequate to draw 
conclusions and answer the question or solve the problem. Sometimes primary data 
collection simply is not necessary. 


It is far cheaper to collect secondary data than to obtain primary data. For the same level of 
research budget, a thorough examination of secondary sources can yield a great deal of more 
information than can be gathered through a primary data collection exercise. 


6.2 Collection of Secondary Data 


Thetime involved in searching secondary sources 1s much less than that needed to complete 
primary data collection exercise. 


Secondary sources of information can yield more accurate data than that obtained through 
primary research. 


It should not be forgotten that secondary data can play a substantial role in the exploratory 
phase of the research when the task at hand is to define the research problem and to generate 
hypotheses. 


Secondary data can be extremely useful both in defining the population and in structuring 
the sample to be taken. 


6.2 Collection of Secondary Data 
There are several sources of secondary published data and available in 


(a) various publications of the central, state and local governments such as publications of 
economic indicators, census data and statistical abstracts; 


(b) various publications of foreign governments or international bodies and their subsidiary 
organizations; 


(c) books and periodicals; 


(d) reports and publications of various associations connected with business and industry, 
banks, stock exchanges, etc; 


(e) reports generated by research scholars of academic institutions in various fields; and 


(f) public records and statistics, historical documents and other sources of published 
information. 


The secondary data also could be from unpublished sources such as unpublished biographies 
and autographies, unpublished research thesis, working research papers of scholars, etc. 


6.2 Collection of Secondary Data 


Researchers must be very careful in using secondary data because it is just possible that the available 
data may not be suitable or may be inadequate in the context of the problem under investigation. By 
way of caution, the researcher, before using secondary data, must see that they possess following 
characteristics. 


Reliability of data: The reliability can be tested by investigating such things about the said data: (a) 
Who collected the data? (b) What were the sources of data?(c) Was the data collected by using 
appropriate methods? (d) At what time period data was collected? (e) What level of accuracy was 
desired and how far it was achieved? 


Suitability of data: The data that are suitable for one enquiry may not necessarily be found suitable for 
another enquiry. The researcher must carefully scrutinize the definition of various terms and units of 
collection used in the study before identification of relevant data from the published sources. 


Adequacy of data: If the level of accuracy in data is found inadequate for the purpose of present 
enquiry, data will be considered as inadequate and should not be used by a researcher. The data will 
also be considered as inadequate, if they are related to an area which may be either narrower or wider 
than the area of the present study. 


6.3 Selection of Appropriate Methods of Data Collection 


We have discussed various methods of data collection. Each method has its own advantages 
and disadvantages. The researcher must judiciously select the method(s) for his/her own 
study keeping in view the following factors. 


Nature, scope and object of enquiry 


Availability of funds 


Time factor 


Precision required 


6.4 Ethical Issues 


Several ethical issues should be addressed while collecting data. We should be concern on 
Whether one's procedures of collecting information are likely to cause any physical or 
emotional harm, These may be caused by: 


violating participants’ right to privacy by posing sensitive questions or by gaining access to 
records which may contain personal data; 


observing the behavior of participants without their being aware 


allowing personal information to be made public which participants would want to be kept 
private; and 


failing to observe/ respect certain cultural values, traditions or taboos valued by the 
participants. 


6.4 Ethical Issues 


Several methods for dealing with these issues may be recommended: 
obtaining the respondent's consent before the study or the interview begins; 


not exploring sensitive issues before a good relationship has been established with the 
participant; 
ensuring the confidentiality of the data obtained; and 


learning enough about the culture of participants to ensure it is respected during the 
data collection process. 


If sensitive questions are asked, for example, about family planning or sexual 
practices, or about opinions of patients on the health services provided, it may be 
advisable to omit names and addresses from the questionnaires. 
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Learning Outcomes 


ading this chapter, student should be able to: 


> Understand the four basic measurement techniques in business 
research 
> Learn different measurement scales under comparative and non- 


comparative scaling techniques. Select the correct measurement 


scales/for different types of statements in the questionnaire and 
take a number of practical decisions into account while 
developing the scale for questions 

> Vest the measurement instruments for its degree of stability, 


consistency and reliability 


Introduction 


e| Measurement is fundamental to business research. Without 
appropriate measurement, it is difficult, if not impossible, to 


comment on business behavior or business phenomenon. 


A scale of measurement allows the investigator to make 
comparisons of amounts and changes in the variable being 
easured. 

asurement consists of two basic’ processes called 


conceptualization and operationalization. 


ction (Continued) 


¢ First, variables are defined by conceptual definitions (constructs) 


that explain the concept the variable is attempting to capture. 


variables are defined by operational definitions, that is, 


definityons of how variables will be measured. 


It As followed by an advanced process called determining the 
vels of measurement, and measuring reliability and validity of 


instrument which is the focus of the current chapter. 


asic Measurement Techniques 
Level of measurement is important while measuring the variables. 


The higher the level of measurement of a variable, the more 


is the statistical techniques that can be used to analyze it. 


Theré are four basic types of measurement in business research. 


They are nominal, ordinal, interval and ratio. 


Nominal Scale 


Nominal measurement consists of assigning items 


groups or categories. 
It has following features: 


e Itig used to indicate categories. 
umbers are only used as labels. 
t has no numerical significance. 


It does not represent any order or distance. 


to 


Nominally scaled variables cannot be used to perform many 
Statistical computations such as mean and standard deviation, 
because such statistics do not have any meaning when used with 


nominal scale variables. 


v With/nominally scaled variables, the analysis is confined to 


frequency and cross-tabulation. 


he chi-square test can be performed on a cross-tabulation of 


ominal scale data. 


¢ The main characteristic of the ordinal scale is that the categories 


have a logical or ordered relationship to each other. 


These types of scale permit the measurement of degrees of 


difference, but not the specific amount of difference. 


This sgale is very common in marketing, satisfaction and attitudinal 


Ordinal scale implies "ranking" on the basis of preference. 
1. It does not say anything about the "distance". 


1. Ranks are not interchangeable as nominal scale labels are. 


rdinal Scale (Continued) 


e In addition to frequency tabulation and cross- tabulations, the other 
Statistics that can be used with the ordinal scale are median, various 


percentules such as quartile and rank correlation. 


e The arithmetic mean should not be used as the average ranking 


s not make any sense here. 


Interval scale 


interval scale is also known as rating scale and variables 
(attributes) are measured on different scales such as scale of 1 to 5 


or 1 to 7 or 1 to 10. 


In this scale, it is assumed to have equidistant points between each 
of the scale elements. This means that we can interpret differences 


in the/distance along the scale. 


e\\ Th¢ interval data can be used to calculate mean, standard deviation, 


grrelation coefficient, regression, analysis of variance, factor 


ahalysis, and a whole range of advanced multivariate and modeling 


techniques. 


Ratio Scale 


e| A ratio scale possesses all the properties of the nominal, ordinal and 
interval scale, and in addition, an absolute meaningful zero point. 


(Example//age) 


*\| It is the top level of measurement and it can be used for all the 


Statistics as in the case of interval scale. 
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ed Measurement Techniques 


various types of scaling techniques used in research can be classified into two 


gories: (1) comparative scales and (11) non-comparative scales. 


comparative rating scale, the respondent evaluates (assign a value) to a particular 


m/brand/product in comparison to other items/brands/products. 
he evaluation cannot be possible without comparison. 


the other hand, in non-comparative scaling respondents evaluate each 1tem/brand/product 


dently without comparing with any other items/brands/products. 


Co ative Scales 


comparative scales can be further divided into the following four types of scaling 
ues: 

red comparison scale 

nk order scale 

onstant sum scale and 


Q-sort scale. 


Comparison Scale 


is a comparative scaling technique in which a respondent is presented with two objects 
time and asked to select one object (rate between two objects at a time) based on some 
erion. 

e data obtained is ordinal in nature. 


respondent has to make [ n (n-1)/2] paired comparison for n number of items. 


Ad ges 


special techniques such as multidimensional scaling require the data to be collected 


d on pair comparison. 


ta obtained 1s ordinal. 


the number of objects (n) is large, there is a high risk of ill considered answers or refusal 


r refusal to answer. 


dent is presented with several objects simultaneously and ask to order or rank them 
g to some criterion. 

ages 

kes less time as compared to "paired comparison scaling". If there are "n" stimulus 
ects, only (n-1) scaling decisions need to be made. 

nk order scaling is commonly used to measure preference of the brand as well s 
tributes. 


ank order data is obtained in ordinal data. 


caling results in ordinal data. 


pondent allocates a constant sum of units (usually points) among a set of stimulus 


ts with respect to some criterion. 
attribute is unimportant, the respondent may assign it zero point. 


n attribute is twice as important as some other attribute, it receives twice as many points. 


llows fine discrimination among stimulus objects without requiring too much time. 
e constant sum scaling not only measures the preference but also the degree to which a 


rticular attribute is more important than others. 


ondents may allocate more or fewer units than those specified. 


Q-Sort Scale 


°- It rank order procedure and the objects are sorted into piles based on similarity with 


to some criteria. 
mber of objects to be sorted varies between 60 and 140 approximately. 


say there are nine brands. On the basis of taste we can classify the brands into tasty, 
rate and non-tasty. We can also classify on the basis of price as low, medium and high. 
we can attain the perception of people that whether they prefer low-priced brand, high or 
erate. We can classify 60 brands or pile it into 3 piles. So the number of objects is to be 


ed in three piles-low, medium or high. 


Q-sort technique is an attempt to classify subjects in terms of their similarity to 


tudy. 


Non-Comparative Scales 
Non-comparative scales are further divided into following four-types: 
(1) Continuous rating scale 
(11) Likert scale 
(1) Semantic differential scale and 


(iv) Stapel's scale 


us Rating Scale 
the oldest and most widely used method for performance appraisal. 


pondent is asked to rate the objects by placing a mark at the appropriate position on a 


hat runs from one extreme criterion variable to other. 


spondent is not restricted to selecting from marks previously set by the researcher. It is 


o known as graphic rating scale. 
ere, respondents do not necessarily need to choose the predefined point. 


are free to occupy any position in the graph. 


Rating Scale 


Scale 


one of the most popular non-comparative rating scaling techniques in management 


arch. 


this scale, the respondent indicates a degree of agreement or disagreement with each of the 


ries of statements about the stimulus objects. 


ach statement is assigned a numerical score ranging from either-2 to +2 or 1 to 5. Total 


of each respondent is calculated by summing across the item. 


ges 


t scale is easy to construct and administer. 
ondents readily understand how to use the scale, making it suitable for mail, telephone 


ersonal interview. 


vantages 


takes longer time to complete than other itemized rating scales because respondent has to 
ead each statement. 
are needs to be taken when using Likert scales in cross-cultural research, as there may be 


| variations in willingness to express disagreement. 


Se c Differential Scale 


er scale that is commonly used by business researchers is the semantic differential 


is quite similar to rating scale in which, the end-points are associated with bipolar 


Is (adjectives) such as "cold" and "warm" or "unreliable" and "reliable", and so on. 


ere are many intermediate points in between two extreme points and could be coded as 1 


5 or | to 7. 


Ad ges 


advantage of using semantic differential is its simplicity, while producing results 


parable with those of the more complex scaling methods. 


e method is easy and fast to administer, but it is also sensitive to small differences in 


itude, highly versatile, reliable and generally valid. 


pel's Scale 


pel's scale, developed by Jan Stapel, is useful for researchers to understand the positive 
d negative intensity of attributes of respondents. 


It has following distinctive features: 


Each item has only one word/phrase indicating the dimension it represents. 


Each item has ten response categories. 


Each item has an even number of categories. 


The response categories have numerical labels but no verbal labels. 


Practical Consideration 


e While developing the scale for questions, a number of practical decisions should be 


taken into account. 
e They are described below. 


(1) Number of scale categories 

(11) Number of items to measure concept 
(111) Odd or even number of categories 
(iv) Balanced scale or unbalanced scale 


(v) Forced choice or non-forced choice 


Number of Scale Categories 


e One of the important decisions that one has to consider is whether one should have 


5 point scale or 7 point scale or 10 point scale. 


e From a research design perspective, the larger the number of categories, the 


greater the precision of the measurement scale. 


umber of Items to Measure Concept 
Concepts are measured using scales with multiple items known as multi-item scales. 


A multi-item scale consists of a number of closely related individual statements 


(items) whose responses are combined into a composite score. 


e Generally it is common to see five to seven items and even more to measure a single 


concept. A minimum of three items is must to achieve acceptable reliability. 


Odd or Even Number of Categories 

e There is debate on whether one should use even or odd numbers of points on a 
rating scale. 

e In an odd number of categories of a scale, the mid-point represents a neutral 
position. 

e This type of scale is generally used when based on the experience or judgement 
of the researchers, it is believed that a part of the sample are likely to feel neutral 
about the issue being examined. 

e However, there are reasons a researcher might prefer an even scale over an odd 
one as given below. 

Strongly disagree Disagree Agree Strongly agree 

O oe @ O 


Strongly disagree Disagree Neither agree Agree Strongly 
nor disagree agree 
O © O O O 


Balanced Scale or Unbalanced Scale 


e Scales can be either balanced or unbalanced. 
e The scale is known as balanced if the number of positive (favourable) options is 


equal to number of negative (unfavourable) options. 


Strongly disagree Disagree Neither agree Agree Strongly 
nor disagree agree 


O O O O 


e On the other hand, the scale is known as unbalanced if the number of positive 
options is greater than the number of negative potions or the number of positive 
options is less than the number of negative options. 


e The scales given below are examples of unbalanced scale. 


Strongly disagree Disagree’ Slightly disagree Agree Strongly agree 


O O O O O 
Strongly disagree Disagre Neither agree Somewhat Agree Strongly agree 
nor disagree agree 


O O O O O O 


ced Choice or Non-forced Choice 
The discussion on even-point scales leads us into a discussion of forced and unforced 


choice questions. 
The four-point scale below is an example of forced choice. 


trongly disagree Disagree Agree Strongly agree 


O O O O O 
e In this case, the respondents are forced to either agree or disagree. 


e However, the five-point scale below is an example of unforced choice question. 


Strongly disagree Disagree Neither agree Agree Strongly agree 
disagree 


O O O O O 


aracteristics of Good Measurement 


nd measurement must meet the tests of validity, reliability and practicality. 


fact, these are the three major considerations one should use in evaluating a 


easurement tool. 
alidity refers to the extent to which a test measures what we actually wish to measure. 
eliability has to do with the accuracy and precision of a measurement procedure. 


Practicality 1s concerned with a wide range of factors of economy, convenience and 


retability. 


The extent to which the instrument measures what it claims to measure. 

For example, a test that is used to screen applicants for a job is valid if its scores are 
directly related to future job performance. 

(1) Content validity 

(11) Face validity 

(111) Criterion validity 


(iv) Construct validity 


Content Validity 


Content validity pertains to the degree to which the instrument fully assesses or 


measures the construct of interest. 


It occurs when the experiment provides adequate coverage of the subject being 


studied. 


Face Validity 
The term face validity has a similar meaning. 


However, face validity generally refers to "non-expert"” judgements of individuals 


completing the instrument and/or executives who must approve its use. 


Respondents may refuse to cooperate or may fail to treat seriously measurements 


that appear irrelevant to them. 


Therefore, to the extent possible, researchers should strive for face validity. 


Criterion Validity 


e The criterion validity relates to our ability to predict some outcome or estimate 


the existence of some current condition. 

e It reflects the success of measures used for some empirical estimating purpose. 
e Any criterion must be judged based on four qualities: 

(1) Relevance 

(1) Freedom from bias 

(ii) Reliability 


(iv) Availability 


A criterion is relevant if it is defined in terms we judge to be the proper 


measure. 


Freedom from bias is attained when the criterion gives each subject an equal 


opportunity to score well. 
A reliability criterion is stable and reproducible. 


Finally, the information specified by the criterion must be available. If it is not 


available, how much it will cost and how difficult it will be to secure? 


Criterion-related validity is expressed as the coefficient of correlation between 
test scores and some measure of future performance or between test scores and 


scores on another measure of known validity. 


Construct Validity 


e The construct validity is the most complex and abstract. 

e In attempting to evaluate construct validity, we consider both the theory and the 
measuring instrument being used. 

e A measure is said to process construct validity to the degree that it confirms to 


predicted correlations with other theoretical propositions. 


st of Reliability 


The reliability of a measure indicates the extent to which it is without bias (error-free) 
and thus ensures consistent measurement across time and across the various items in 


the instrument. 
e A measure is reliable to the degree that it supplies consistent results. 


e Reliability is necessary contributor to validity but is not a sufficient condition for 


validity. 


e The reliability of a measure is an indication of the stability and consistency with 


ich the instrument measures the concept and thus helps to access the goodness of a 


ability of Measures 


A measure is said to process stability if one can secure consistent results with 


repeated measurements of the same subject (respondent) with the same instrument. 
e Two tests of stability are 
(i) test-retest reliability 


(ii) parallel-form reliability 


Internal Consistency of Measures 


The internal consistency of measures is indicative of the homogeneity of items 


that measure the construct. 


The items should hang together as a set and be capable of independently 
measuring the same concept so that the subjects attach the same overall meaning 


to each of the items. 


This can be examined through the following two tests. 


r-item Consistency Reliability 


nter-item consistency reliability also simply known as “internal consistency" measures 
the degree to which different items measuring the same construct attain consistent 


results. 


Scores on different items designed to measuring the same construct should be highly 


correlated. 


e The most popular test of inter-item consistency reliability is the Cronbach's coefficient 


of alpha. 


Cronbach’s alpha reliability : 0.6-0.7 indicates an acceptable level of reliability and 0.8 


ater a very good level. 


Split-half Reliability 


e Split-half reliability reflects the correlations between two halves of an 


instrument. 
e The two halves can be created by splitting the items I several ways: 


(1) Odd and even numbered items 
(1) First and second halves 
(111) Randomly 
e If the results of the correlation are high, the instrument is said to have high 
reliability in an internal consistency tells us there is homogeneity among the 


items. 


Test of Practicality 


e Economy consideration suggests that some trade-off is needed between ideal 


research project and that which the budget can afford. 


e Convenience test suggests that measuring instrument should be easy to administer. 
One should give due attention to the proper layout of the instrument with clear 


instructions and coding. 


Interpretability consideration is more important when persons other than the 


designer of the instrument are to interpret the results. 
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Learning Outcomes 


After reading this chapter, student be able to: 
+ Understand the basic rules of questionnaire design 
% Implement do’s and don’ts while designing the questionnaire 
+ Know how to validate the survey questionnaire 
Know how to per-test the questionnaire 


* Understand how to plan for fieldwork for survey 


Introduction 


“* The design of a questionnaire depends on the purpose of data collectio 


% It depends on whether the researcher’s objective is to collect qualitative 
or quantitative information. 


% The qualitative information is needed if the study is more exploratory in natu 
purpose of better understanding of a given research problem or the gene 
hypotheses on a subject. 


The quantitative information is needed if the study is more of conc 
which requires testing of hypotheses that have been previously gener 


Exploratory questionnaires 
% It may not be necessary to have formal questionnaire if the 
collected is qualitative in nature and thus, does not require 
validation. 
% The researcher might prepare a brief guide, listing of important op 


questions with appropriate probes/ prompts listed under each. 


Formal standardized questionnaires 
% If the researcher is interested to statistically analyse the data to test t 

conclusive study, a formal standardized questionnaire must be designed. 
% Such questionnaires are generally characterized by: 


> Prescribed wording and order of questions, to ensure that each resp 
the same stimuli. 


> Prescribed definitions or explanations for each question, to ens 
handle questions consistently and can answer respondents’ requests 
if they occur. 


> Prescribed response format, to enable rapid completions of the questi 
the interviewing process. 


Questionnaire Design for Business Research 


% If the same study with exactly same hypotheses is being set fo 
researchers, most likely each of them may come up with their o 
questionnaire. 


% All the five questionnaires set by five researchers independently ma 
widely in their choice of questions, wording, coding, sequencing, use o 
ended questions and so on. y 


There are a number of points that should be borne in mind. 


% A questionnaire should be designed in such a way that it meets the objectis 
by the researcher. 


% The questionnaire design should be such that it obtains the complete and accu 
as far as possible. The researcher must ensure that questions are asked in su 
respondent fully understand the meaning of the questions and are not likel 
answer or lie to interviewer. 


% A well-designed questionnaire should ensure that it is easy for the respondents to 1 
questions as well as easy for the interviewer to record the answer of the respond 


% The questionnaire should be designed in such a way that it makes the intervie 
the point, so as to make respondents remain interested till the end of the surve 


Defining the Target Population 


% The researcher must define the target population about which he 
generalize from the sample data to be collected. 


% Secondly, researchers have to draw up a sampling frame, method of sa 
sample size. 


information such as the age, education, etc. of the target respondents 
characteristics of the sample are as similar as the characteristics of the targ 


Language 


% A questionnaire printed in English could be administered to the resp 
language he/she speaks by a trained interviewer who could translate ea 
Spot. 


Deciding on What to Ask 
% The next obvious question is “What you are going to ask? 


“* In order to meet the objectives of the survey, it is important to decide 
one need to collect from the respondents. 


% A number of questions might come in one’s mind related to the researc 
under investigation. 


% But here the question is what questions should be asked a 
avoided. 


There are three potential types of information, researchers should be intere 


% Information they are primarily interested in, that is, dependent variables. 


% Information which might explain the dependent variables, that is, independent va 


% Other factors related to both dependent and independent factors which may distort 
have to be adjusted for, that is, confounding variables. 


Deciding How to Ask 


% Questions can be asked in the close-ended form or open-ended form. 


C Advantages Disadvantages 


Closed-ended m 


Open-ended m 


Easy and quick to answer 

Easy to compare responses across the 
respondents 

Answers easier to analyse on computer 
Response choices make questions clearer 
Easy to replicate study 


Permit unlimited number of answers 
Respondents can quality and clarify 
responses 

Can find the unanticipated answer 
Reveal respondents thinking process 


Can put ideas in respondent’s head 
Respondents can feel constrained/frustrated 
Many choices can be confusing 

Cannot tell if respondent misinterpreted the 
question 

Fine distinctions may be lost 


Answers can be irrelevant 

Inarticulate or forgetful respondents are at 
disadvantage 

Coding responses are subjective and tedious 
Requires more response time and effort 


o 


Wording of Individual Questions 
«+ Use short and simple sentences 


% Ask for only piece of information 
at a time 


% Avoid negatives if possible 
% Ask precise questions 


% Questions must generate 
* the required information 


Words must have the same meaning 
to all respondents 


% Avoid leading questions 
% Avoid hypothetical questions 
% Do not overtax the respondent's 


% Ensure those you ask hav 
knowledge 


% Level of details 
i oo 
«¢ Sensitive issues 


«e Minimize bias 


Sequencing of Questions 


% Opening questions 


% Question flow 


% Question variety 


The following points should be considered while sequencing question 
questionnaire. 


«* Put the most important items in the first half of the questionnaire. 
% Do not start with awkward or embarrassing questions. 
“* Start with easy and non-threatening questions. 

% Go from the general to the particular. 

% Go from factual to abstract questions. 

% Go from closed to open questions. 


% Leave demographic and personal questions until last. 


Length of Questionnaire 


% There are no universal agreements about the optimal length of questionnég 

% It probably depends on a number of factors such as the number of objectives 

study, 

vV type of respondents (whether target respondents are consumers, man: 

students or kids) 
survey). 


% As a rule of thumb, ask only necessary questions and avoid u 


Ease of Recording 
Questionnaire design should ensure. 


% It is easy to carry, visible in different kinds of light. 


% The distance between different answer categories should be sufficient. 


% There is no confusion or mistake while placing a tick over the actual resp 
question. 


Coding 


% If the questionnaire is coded before doing the fieldwork (as most of the quest 
to be these days), it must be ensured that the fieldworkers know where to marl 
the code or actual answer choice. 

Analysis Required 


% It is important to plan for analysis well before designing the questionnaire. 


* Similarly, if you plan to analyse the factor analysis to empirically find out impo ‘al 
of a theoretical construct, the questions need to be measured on a rating scale. 
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Learning Outcomes 


After reading this chapter, student be able to: 
+ Understand the basic rules of questionnaire design 
+ Implement do’s and don’ts while designing the questionnaire 


+ Know how to validate the survey questionnaire 


+ 


e Know how to per-test the questionnaire 


e Understand how to plan for fieldwork for survey 


+ 


General Appearance of a Questionnaire 


% The physical appearance of a questionnaire can have a significa 
upon both the quantity and quality of survey data obtained. 


% The appearance of questionnaire has an impact on respondents' motiva 
giving response. 


Introduction to Respondents 


“* It is important to have the covering letter if the questionnaire is to be mailed 
distributed to respondent. 


% The purpose of such a letter is to introduce the respondents with the objec 
the survey in order to encourage them to actively participate in the sur 


Instructions 


% Interviewer instructions should be placed alongside the questions to 
pertain. 


% Instructions on where the interviewers should probe for more informat 
replies should be recorded are placed after the question. 


Concluding Questionnaire with an Open-ended Question and Thanks 

% The questions that are of special importance should be placed in the e 
questionnaire and the sensitive questions towards the end of the questionne 

% It is always good to end the questionnaire with an open-ended questio 
opinion on the topic. 

+ 


% At the end of the questionnaire do not forget to "thank" the respondent once agi 
valuable time spent on completing the survey. 


Pre-testing Questionnaire 
“* Pre-testing the questionnaire is an essential step before its completion. 


% The purpose of the pre-test is to check question wording, and to obtai 
ended questions with a view to design a multiple choice format in the fina 


The purpose of pre-testing the questionnaire is to determine: 


% Wording of the questions are correct to convey the same mean 
respondents. 


% Whether the questions have been placed in the right sequence. 
% Whether the questions are clearly understood by all classes of respondents. 


“* Whether additional questions are needed or whether some questions 
eliminated. 


% Whether the instructions to interviewers are clear and adequate. 


Design of Fieldwork 


“* Careful planning is required for prompt receipt of survey questi 
different areas covered in the data collection undertaking. 


% A clear plan for data collection should be developed by the researcher. 


Fieldwork Plan 


% Fieldwork plan is clearly linked to the sampling plan. 


% Once the sampling area (cities, town, etc.) and the sample size are d 
each, the next step is to plan on the following: 


e Who will do the fieldwork? 

e When should the fieldwork start? 

e How long should the fieldwork be carried on? 
% First, you need to plan who will do the fieldwork. 


% The second question is when the data collection should be ca 


% For the third issue, i.e. for the time requirement for data collection, on 
the following points. 


Step 1: Calculate the: 


e time required to reach the study area(s), 

e time required to locate the study units (persons, groups or records), and 

e number of visits required per study unit. In the longitudinal studies, the surve 
be carried out from time to time, whereas in the cross- sectional study, th 
done once forever. 


Step 2: Calculate the number of interviews that can be carried out per field- 
worker per day. 


Step 3: Calculate the number of days needed to carry out the interview 
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A multinational technology company that 
specializes in internet-elated services and products. 


(A Ue tak 0 Gmail, Drive, Docs, Sheets, Slides, Meet, Classroom, 
etc. 


Google Form An online tool that allows user to create and 
distribute surveys, quizzes and forms. 


Export Data to Excel Export the collected data for further analysis 
or processing in Excel. 


a2 u GA v 


Creating a form Collecting data Data analysis Advanced feature 


Creating A Google Form 


e Start by deciding what type of questions you want to ask. 
e Go to forms.google and log in with Google account. 
e Create a new form and sometimes, the sample templates 


are useful. 


e Customize the form by adding questions and selecting the 
appropriate respond types. 

e Customize the design and appearance of the form using 
themes and colors. 

share the form to the respondents. 


e Lastl 


Collecting Data 


Once Google Form is live, 
Start collecting responses. 
Be able to share the form : via email, social media, 
direct link or embedding it in a website. 
Once respondents start filling out the form, the 
response will be automatically collected and stored. 
View responses in real-time 
Send out reminders to those who haven't responded yet. 


Export data into different formats, like CSV or Excel , 


to further analyze the data. 


Basic Data Analysis on Google Form 


Be able to view a summary of responses. 


f . What color do you prefer to play as? 
e Provides an overview of response counts, 


77 responses 


average values and charts for multiple 

@ Black 
@ Blue 
® Brown 
@ Cyan 
@ Lime 
@ Green 
@ Orange 
@ Pink 
@ Purple 
@ Red 


1/2 Y 


choice questions. 
e Automatically generate bar chart and pie 
chart. 


e Give a quick snapshot of the data. 


Basic Data Analysis on Google Sheets 


Downloading Data 

e Export the collected data from Google 
Forms to a Google Sheets spreadsheet. 

oa T il ¢ Provide more flexibility for in-depth data 


analysis using spreadsheet functions, 


formulas and advanced charting options 


available in Google Sheets. 


Third-Party Tools Basic Data Analysis on Google Sheets 
If you require more advance data analysis 

e Export the data from Google Sheets to other data 

analysis tools like Microsoft Excel, Google Data 

Studio or Statistical analysis software such as SPSS, ia 

R, Python, Eviews and so on. 


e These tools offer extensive capabilities for data 


manipulation, visualization and advanced statistical 


analysis. 
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Questionnaire Design 


e Begin with short, clear instructions. 


e State the survey purpose. 


e Assure anonymity. 


e Instruct on how to submit the completed survey. 


Questionnaire Design 


e Break survey into naturally occurring sections. 


e Let respondents bypass sections that are not applicable (e.g., “if you answered no 
to question 7, skip directly to Question 15”). 


e Pretest and revise as needed. 


e Keep as short as possible. 


Questionnaire Design 


Type of Question Example 


Open-ended question Briefly describe your job goals. 


Fill-in-the-blank How many times did you attend formal 
religious services during the last year? 
times 


Check boxes Which of these statistics packages 
have you ever used? 
Q SAS Q Visual Statistics 
Q SPSS Q MegaStat 
Q Systat Q Minitab 


Questionnaire Design 


Type of Question 


Ranked choices 


Example 
“Please evaluate your dining experience” 
Excellent Good Fair Poor 
Food 
Service 
Ambiance 
Cleanliness 
Overall 


Questionnaire Design 


Type of Question 
Pictograms 


Likert scale 


Example 


“What do you think of the President's 
economic policies?” (circle one) 


0o00 


Statistics is a difficult subject. 
Neither 
Strongly Slightly Agree Nor Slightly Strongly 
Agree Agree Disagree Disagree 
Disagree 
E E E E E 
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Chapter 9 
Sampling Design: Theory and Practice 


D7 Outcomes 


= After reading this chapter, student should be able to: 

LJ Understand why you need to sample the population 

LJ Know the basic terminology 

LJ Understand the differences between probability and non-probability sampling 
LI Apply the appropriate sampling technique 

Q Determine the sample size 


LJ Understand the factors that could affect the sample size in any study 


LI Understand the different types of errors in research 
A wy J 
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Introduction 
E The concept of sampling is used in our day to day life. 


Q Sampling may be defined as the selection of some part of the population on the basis of 
which a judgement or inference about the entire population is made. 


LJ In most of the research work and surveys, the usual approach is to make generalizations 
or to draw inferences based on samples about the parameters of the population from 
which the samples are taken. 


LI So the sample should be drawn in such a way that it is true represent of the entire 


population. 
9 


Lì The process of sample selection is called sample design and the survey conducted on / 
the basis of sample is described as sample survey. 


“S D K T 


Why Sampling? 


Q The question is “Why sample the population?” The following points justify why we need to-~ 
_choose the sample rather than going for complete census of the target population. 


= The population is dynamic, i.e. the component of the population could change over time. 
Thus, it is practically impossible to check all items in the population. 


= The cost of studying the entire population could be very high. A sample study is usually 
less expensive than a census study. 


= Contacting the whole population would often be time consuming. Sampling can save time 
as the results can be produced at a relatively faster speed. 


LJ Sampling remains the only way when population contains infinitely many members or when » 
the experiment involves the destruction of the items under study. 


LJ Sampling usually enables to estimate the sampling errors and assists in obtaining information 
concerning some characteristics of the population. 4 9 J 
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Defining Basic Terminology 
LJ Population, Element and Population Size 
LJ Sample, Subject and Sample Size 


L) Parameter and Statistics 


LJ Sampling Frame 


Population, Element and Population Size 


Population refers to the entire group of people, events or things of interest that 
researcher wishes to investigate. 


Each member of the population is known as element. 


The total number of elements in the population is known as population size and it is 
denoted by “N”. 


4 


— 


Sample, Subject and Sample Size 
Q The sample is the subset of the population. 
Q Each member of the sample is known as subject. 


LJ The total number of subjects in the sample is known as sample size and it is denoted 
by Sahl. 


eZ 
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d Parameter and Statistics 


LI The characteristics of the population are known as parameters whereas the characteristics of 
the sample are known as statistics. 


LJ Statistic is used to estimate the value of the parameter. 


LJ Note that the value of statistic changes from one sample to the next which leads to a study of 
the sampling distribution of statistic. 


wT < 
j Sampling Frame 


Q It is a complete listing of the population of interest from which the sample is drawn. 


Q All members of the sampling frame have a probability of being selected. 


LJ] Without some form of sample frame, a random sample of a population, other than an 
extremely small population, is impossible. 
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Chapter 9 
Sampling Design: Theory and Practice 


D7 Outcomes 


= After reading this chapter, student should be able to: 

LJ Understand why you need to sample the population 

LJ Know the basic terminology 

LJ Understand the differences between probability and non-probability sampling 
LI Apply the appropriate sampling technique 

Q Determine the sample size 


LJ Understand the factors that could affect the sample size in any study 


LI Understand the different types of errors in research 
A wy J 
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SH line Techniques 


= 


~Probability Sampling Techniques 
LI In probability sampling technique, a sample is being selected using random selection 


so that each element of the population has a known chance of being selected. 


LJ It is generally assumed that a representative sample is more likely to be the outcome 
when this method of selection from the target population is employed. 


LI Thus, findings based on probability sampling can be generalized to the target 
population with a specified level of confidence. 


E — 
_ Simple random sampling 


O The simple random sampling is the most basic form of probability sampling. 

Lì Here the sample is drawn from the target population in such a way that each and every 
member of the population has an equal and known chance of being the subject of the 
sample. 

The procedure for drawing large sample involves the following steps: 


LJ Sequentially assign a unique identification number to each element of the population. 


Q Use a random number generation (such as lottery method) to identify the meagan ernie: 
element to be the part of sample. 


LJ Ensure that no element is selected more than once. 


4 S 
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N=300 employees 
n=300*20%=60 employees 
The sample number are 


56 114 186 111 142 136 97 188 79 38 58 273 77 225 139 
53 241 163 300 159 152 124 141 79 £4 129 239 186 134 249 
173 102 188 142 147 45 198 300 164 98 134 20 9 270 266 
102 280 233 103 23 59 £4 269 175 80 216 234 33 43 US 
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E LA 
EZ anea sampling 2 


LI In stratified random sampling, the population is divided into different subgroups 
known as strata on the basis of some criteria. 


Q Then the method of simple random sampling is used to draw the sample from each 
Stratum. 


LI Now the sample can be drawn from each stratum either proportionately or 
disproportionately. 


LJ In proportionate stratified random sampling, the sample taken from each stratum will 
be equal to 20% of the population size of the strata as shown in column 3. = 


=Í 
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Mn disproportionate stratified random sampling, the percentage of population taken as the 
sample for different strata is not the same. 


Q Disproportionate sampling decisions are made either when some stratum or strata are too 
small or too large, or when there is more variability suspected within a particular stratum. 


Q Disproportionate sampling is also sometimes done when it is easier, simple, and less 
expensive to collect data from one or more strata than others. 


N=885 employees 
n=885*20%=177 empployees 


JL Number of employee Prp sample Nonprop sample 
TM 15 3 
MM 40 8 
LM 60 12 
S 140 28 
C 600 120 
S 30 Ó 
3 S7 Ə 
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-0 Systematic sampling is very similar to random sampling, and is easier in practice. 


LJ Once the sample size is decided, we divide the total population into n parts, where ‘n’ 
is the sample size required. 


LJ From the first part of sampling units, we pick up one at random. 


LI We then pick up every (N/n)" item from the remaining parts. D 


o" 
N=300 households 
n=15 households 
k=N /n=300/15=20 
20" every households 
First no.= 7 


The sample number are 
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Cluster Sampling 


O A cluster is a group of sampling units or elements which can be identified or listed and a 
_/sample of which can be chosen. 


Q In simple random sampling, elements are randomly selected from the entire population of 
study. 


Q In cluster sampling, population is divided into groups of elements with some groups 
randomly selected for the study. 


LI However, it is different from stratified sampling. 


A 


~~ 


L}b-The selection of a sample of clusters to provide a sample of population units is called cluster 
sampling. 


Q The two primary reasons cluster sampling is employed for sample surveys of human populations 
in large geographical regions are feasibility and economy. 


LI Cluster sampling is often the only feasible method of probability based sampling because the 
sampling frames for the target populations are lists of clusters. 


LJ Further, it is economical compared to simple random sampling as the subjects of the sample are 
selected from the randomly selected strata rather than from the entire population. 


Z 
Q However, the statistical efficiency of cluster sampling is usually lower than simple random 


sampling. 
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N=500 Households 
Ward 1,2,3,4,5 
n=100 households 

The sample number are 


Ward N n 
] 50 50 
2 100 

3 50 50 
4 100 

5 100 
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Multistage cluster sampling 


LI In single-stage cluster sampling, the population is divided into convenient clusters and 
we randomly choose the required number of clusters as sample subjects and investigate 
all the elements in each of the randomly chosen cluster. 


LJ) Cluster sampling can also be done in several stages and it is known as multistage 
cluster sampling. 
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Non-probability Sampling A 


w 


Q Some of the non-probability sampling techniques are commonly used explicitly in 
cases where it is not possible to use the probability sampling. 


Q The major difference is that in non-probability techniques, the extent of bias in 
selecting the sample is not known. 


LI This make difficult to say anything about the representativeness or accuracy of the 
sample. 


LI Nevertheless, if done carefully, some of these are good approximation of probability 
sampling. ] 
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OF venient Sampling wy 


~O It refers to the collection of information from members of the population who are 
conveniently available to provide it. 


LJ It involves picking up any available set of respondents convenient for the researcher to 
use. 


Q The convenience sampling is very often used by researchers in order to cover the large 
number of survey quickly and cost effectively. 


LL] However, it suffers from selection bias because the individuals surveyed are often 
different from the target population and thus may not be a true representative of the 
population for the study in hand. | 
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Judgement Sampling 


wf 


Q A judgement sample, sometimes referred to as a purposive sample, involves 
selecting elements in the sample for a specific purpose. 


LJ It involves the choice of subjects who are most advantageously placed or in the 
best position to provide the information required. 


LI Judgement sample might be a group of experts with knowledge about a particular — 
problem or issue. 
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E — 
_ Quota sampling 
~~ 


J Quota sampling is a non-probability sampling technique, wherein certain groups are 
adequately represented in the study through the assignment of a quota based on 
known characteristics. 


Q Quota sampling could be either proportional or disproportional. 


Q In proportional quota sampling, we represent the major characteristics of the 
population by sampling a proportional amount of each. 


LJ Disproportional quota sampling is a bit less restrictive. In this method, we specify 


the minimum number of sampled units we want in each category. 
9 


Q This method is typically used to assure that smaller groups are adequately ) 
represented in the sample. 
to ne 
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Det mining the Sample Size 
5 Z 
U Determining sample size is a very important issue because samples that are too large may 
waste time, resources and money, while samples that are too small may lead to inaccurate 
results. 


Lì To determine the sample size three pieces of information are required. They are: 
i. the degree of confidence necessary to estimate true value, 
li. the precision of the estimate and 


ui. the amount of true variability present in the data. 


Level of Precision 


w 


LJ Precision refers to how close our estimate is to the true population characteristics. 


Q We estimate the population parameter to fall within a range based on the sample 
estimate. 


LJ The narrower this interval, greater is the precision. 
LI 995000- 1050000 
LI 990000- 1100000 


Q That is, we would now estimate the population mean to lie within narrower range, 
which in turn means that we would now estimate with greater precision. 


_ Variability in Data 
4 


Q Precision is a function of the range of variability in the sampling distribution of the sample 
mean. 


L] The smaller the dispersion or variability, the greater the probabilities that the sample mean 
will be closer to the population mean. 


Q However, there is no need to take several different samples to estimate the variability. 


Q This variability is known as standard error(s.) and is calculated as: 


Two iN he noted here. 


(1) Imorder to reduce the standard error Gn one needs to increase the sample size at a piven 
standard deviation of the sample (S). 


(11) Smaller the variation in population, the smaller the standard error(s,), which in turn implies 
that sample size need not be large. 


= We need the greater precision if we want our sample results to closely reflect the 
characteristics of the populations. 


= The greater the precision required, the larger is the sample size needed, particularly when the 
variability in the population is large. 


AM Confidence W 


O The confidence or risk level is based on ideas encompassed under the Central Limit 
Theorem. 


Q The key idea encompassed in the Central Limit Theorem is that when a population is 
repeatedly sampled, the average value of the attribute obtained by those samples is equal 
to the true population value. 


Q In a normal distribution, approximately 95% of the sample values are within two standard 
deviations (20 ) of the true population mean (u). 
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=A. Data, Precision and Level of Confidence in Estimation - 


a 


Q Precision and confidence play vital role in sampling as we use sample data to draw 
inferences about the entire population. 


Q Because the point estimate provides no measure of possible error, to be fair we do an 
interval estimation to ensure a relatively accurate estimation of the population parameters. 


LJ The standard error S. and the level of confidence determine the width of the interval and 
calculated by using the following formula. 


= For 90% level of confidence, the Z value is 1.65. 
= For 95% level of confidence, the Z value is 1.96. -4 
= For 99% level of confidence, the Z value is 2.58. 


a Affecting the Decision on Sample Size 


Roscoe(1975) proposes the following rules of thumb for determining the sample size: 


w 


= Sample size larger than 30 and less than 500 are appropriate for most research. 


"= Where samples are to be broken into sub-samples such as male/female, 
Malay/Chinese/Indian, etc., a minimum sample size of 30 for each category is 
necessary. 


€" In multivariable data analysis (including multiple regression analysis), the sample 
size should be several times (preferably 10 times or more) as large as the number of 
variables in the study. 


A 
=" For simple experimental research is possible with samples as small as 10 to 20 in 
SIZe. 
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A Types of Error in Research 


a 


Sampling Error 


LJ This is the error which occurs due to the selection of some units and non-selection of other 
units into the sample. 


LJ It is controllable if the selection of the sample is done in random, unbiased way. 
LJ In other worlds, if the probability sampling is used, it is possible to control this error. 


Q In general, sampling error reduces as the sample size increases. 
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Non-sampling Error 
LI This is the effect of various errors in doing the study, by the interviewers, data entry 
operator or the researcher himself. 


LJ Handling a large quantity of data is not an easy job, and errors may creep in at any stage 
of research. 


Q The data entry person may interchange the column of yes and no responses while 
entering and compiling the data, or the interviewer may cheat by not filling up the 
questionnaire in the field and instead, fudge the data. 


The total error in research is the sum of above two errors. 


Total Error = Sampling Error + Non-sampling Error 


Lì The sampling error can be estimated in the case of probability sampling, but not in the 
case of non-probability sampling. 


LI Non-probability sampling can be controlled through hiring better field workers, qualified 
data entry persons, and good control procedures at every stage of research. 
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Formula for the Minimum Sample Size Needed for an Interval Estimate of the 
Population Mean 


EY 


where EF is the margin of error. If necessary, round the answer up to obtain a whole number. 
That is, if there is any fraction or decimal portion in the answer, use the next whole number 
for sample size n. 


EXAMPLE 7-4 Automobile Thefts 


A sociologist wishes to estimate the average number of automobile thefts in a large city 
per day within 2 automobiles. He wishes to be 99% confident, and from a previous study 
the standard deviation was found to be 4.2. How many days should he select to survey? 


Since a= 0.01 (or 1 — 0.99), Za/2 = 2.58 and E = 2. Substitute in the formula 


Round the value up to 30. Therefore, to be 99% confident that the estimate is within 
2 automohiles of the true mean. the sociolovist needs to samnle the thefts for at least 30 davs. 
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Sample Size for Proportions 
To find the sample size needed to determine a confidence interval about a proportion, use 


this formula: 


Formula for Minimum Sample Size Needed for Interval Estimate of a 


Population Proportion 


oe Zaj? 2 
n= E) 


If necessary, round up to obtain a whole number. 


This formula can be found by solving the margin of error value for n in the formula 
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EXAMPLE 7—11 Land Line Phones 


A researcher wishes to estimate, with 95% confidence, the proportion of people who did 
not have a land line phone. A previous study shows that 40% of those interviewed did 
not have a land line phone. The researcher wishes to be accurate within 2% of the true 
proportion. Find the minimum sample size necessary. 


Since Z«/2 = 1.96, E = 0.02, p = 0.40, and g = 0.60, then 


“LY = (0.40)(0. soht) = 2304.96 


n= = pal 


which, when rounded up, is 2305 people to interview. So the researcher must interview 
2305 people. 


4 EXAMPLE 7-12 Home Computers 


In Example 7—11 assume that no previous study was done. Find the minimum sample 
size necessary to be accurate within 2% of the true population. 


Here we do not know the values off andg. So we usep = 0.5 andg = 0.5. 
E = 0.02 and Zaj2 = 1.96 
nn (<a/2\2 
n= pale) 


= (0.5)(0.5) (529) 
= 2401 


Hence, 2401 people must be interviewed when pf is unknown. This is 96 more people 
than needed iff is known. 


In determining the sample size, the size of the population is irrelevant. Only the 
degree of confidence and the margin of error are necessary to make the determination. A -tjyst 
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~Basic Sample Size Determination for Mean 
95% confidence interval, Z=1.96 
S=0.75, E=5% or less than 5% 
n=Z? S?/ E =(1.96)* *(0.75)7/(0.05)* =865 


s= estimate of standard deviation =0.75 


d= acceptable margin of error =0.05 or 0.03 
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4 Basic Sample Size Determination for Proportion 


n=Z2? p q/E2 =(1.96)2 *0.5*0.5 /(0.05)? =384 


Z = 1.96 
p = maximum possible proportion (0.5) 
q= 1l- maximum possible proportion= 1-(0.5) =0.5 


E = acceptable margin of error for proportion = 0.05 J 


A = y 


- Assuming a response rate of 65%, the required minimum sample“ 


size should be calculated base on the following: 
Response rate = 65%. 
Non response rate=35% 


Sample size adjusted for response rate=384/0.65=59 | 


Therefore, required minimum sample size of 591 should be used. 
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e ~~ Sample Size Determination Table 
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Table (1) Table for Determining Minimum Returned Sample Size 


for a Given Population Size for Continuous and Categorical Data 


Sample size 


Population Continuous data (margin of error=.03) Categorical data (margin of error=.05) 


Size alpha=.10 | alpha=.05 p =.50 
t=1.65 t=1.96 l . t =2.58 
87 
154 
300 207 
400 250 
500 286 


Table (2) Minimum Number of Regressors Allowed for 
Sampling Example 


Sample size for: Maximum number of regressors if ratio is: 


Continuous data: n = 111 11 
Categorical data: n = 313 31 


- 
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he of an electrical outlet wants to investigate the level of satisfaction among 


Customers who purchased the refrigerators from the outlet for the past 12 months. From the 


w 


sales records of 480 customers the manager selects a random sample of 80 customers. The 


following table shows the number of customers for each brand of refrigerators. 


Brand of refrigerators 


Numbers of customers 


Sample customers 


150/480*80=25 
90/480*80=15 
20 


. ` . . = 
(1) StateAhe pdpulation of interest. 


or ers who purchased the refrigerators from the outlet for the past 12 months. J 
(11) State the variable of interest in the study. 
cl of satisfaction among Customers 
(111) What is the sampling method used by the manage? 
Stratified Random Sampling 
(iv) Calculate the number of customers selected as samples for each brand. 
(v) Describe how to select the customers for the brand using systematic sampling. 
K=N/n=480/80=6" 
First sample ------ 1 to 6 
Fist sampe: A or 2 
Sample numbers: 2, 8,14,20, 26,32,----- 
Sample numbers: A, A+6, A+12, A+18,--------------------- , A+474. 


K Thank you 
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